empirical mdp iteration
Country:
- North America > Canada > Alberta (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Genre:
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
Industry:
- Information Technology (0.67)
- Leisure & Entertainment > Games > Computer Games (0.46)
Technology:
Country:
- North America > Canada > Alberta (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Genre:
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
Industry:
- Information Technology (0.67)
- Leisure & Entertainment > Games > Computer Games (0.46)
Technology:
Exploiting the Replay Memory Before Exploring the Environment: Enhancing Reinforcement Learning Through Empirical MDP Iteration
Reinforcement learning (RL) algorithms are typically based on optimizing a Markov Decision Process (MDP) using the optimal Bellman equation. Recent studies have revealed that focusing the optimization of Bellman equations solely on in-sample actions tends to result in more stable optimization, especially in the presence of function approximation. Upon on these findings, in this paper, we propose an Empirical MDP Iteration (EMIT) framework. For each of these empirical MDPs, it learns an estimated Q-function denoted as \widehat{Q} . The key strength is that by restricting the Bellman update to in-sample bootstrapping, each empirical MDP converges to a unique optimal \widehat{Q} function.
Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)